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Abstract 


Conventional Earth Observation Payload Data Ground Segments (PDGS) continuously 
receive variable requests for data processing and distribution. However, their architecture 
was conceived to be on the premises of satellite operators and, for instance, has intrinsic 
limitations to offer variable services. In the current chapter, we introduce cloud comput- 
ing technology to be considered as an alternative to offer variable services. For that pur- 
pose, a cloud infrastructure based on OpenNebula and the PDGS used in the Deimos-2 
mission was adapted with the objective of optimizing it using the ENTICE open source 
middleware. Preliminary results with a realistic satellite recording scenario are presented. 


Keywords: Earth Observation, distributed systems, cloud computing, ENTICE project, 
gs4EO 


1. Introduction 


Traditionally, Earth Observation systems have been operated by governments and pub- 
lic organizations; the primary investors being US, China, Russia, Japan and Europe mainly 
because of worldwide common objectives such as climate change, sustainable development 
and objectives at national level. 


However, from 2015 to 2016, the Earth Observation from space paradigm is changing with the 
globalization of the market, the evolution of the information and communication technolo- 
gies and the high investment of private entities in the field. 


This boost of commercial interest in Earth Observation can be explained because of the paral- 
lel evolution of three main pillars, as stated by Denis et al. in [1]: 
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1. Increased performance of commercial satellites with defence needs in the range of very 
high resolution products, i.e. resolutions between 0.25 and 1 m. 


2. The development of hybrid procurement schemes between private and public customers. 


3. Appearance of the New Space scheme started in Silicon Valley, which attracted the interest 
of investors and contributed to the creation and entrance of new actors in the space sector. 


To these, we would add the dedicated budget of new countries, such as Kazakhstan, Venezuela 
and Vietnam, in EO; increased budget in new EO programmes for India, China and South 
Korea [2] and fast evolution of information and communication technologies, which facili- 
tated the creation of new applications requiring availability of lots of information in the short- 
est time possible. This contributed to the evolution of the space sector in two manners: (a) the 
evolution of the sensors to provide highest performance at a lower cost and (b) the launch 
of more satellites to cover the demand of information. This last explains the increase in the 
launch of satellites during the last years and interest of satellite operators to operate satellite 
constellations in order to reduce the revisit time and offer more coverage of the land surface. 
A proof of this is the number of EO satellites launched between 2006 and 2015: 163 satellites 
over 50 kg were launched for civil and commercial applications, generating $18.4 billion in 
manufacturing market revenues, whereas 419 satellites are expected to be launched over the 
next decade (2016-2025), generating $35.5 billion in manufacturing revenues. In terms of EO 
data sales, the market reached $1.7 billion in 2015 and it is expected to reach $3 billion in 2025. 
This is $12.2 billion total revenue in the decade 2006-2015 and $24 billion in the decade 2016- 
2025 [3]. The amount of generated data is used, for instance, to accumulate spatial and tempo- 
ral records of the world itself, of the events and changes that occur in it in a diverse number of 
applications: security, maritime, agriculture, energy and emergency, among others [4]. 


However, the infrastructures used to manage EO data are still based on traditional EO sys- 
tems, which (because of their previous ambit of application) make use of on-site traditional 
infrastructures or data centers. Their architecture was designed to be monolithic in a localized 
single infrastructure. 


Now, the process of recording data from Earth observations generates massive amounts of 
spatiotemporal geospatial information that has to be intensively processed for a variable and 
increasing demand. This is a handicap for traditional data centers since they are not desig- 
nated to manage variable amounts of data. They were designed and sized to operate a certain 
data volume. They are then limited in terms of flexibility and scalability [5]. The storage of 
increasing amounts of data over time is also a challenge, since the recordings are also main- 
tained by their owners over time as well [6]. 


Traditional Earth Observation Payload Data Ground Segments (PDGS) present the following 
limitations to cover the demands of the current EO market: 


i. Traditional infrastructures are not flexible or easily scalable to operate. 


ii. There is a risk of oversizing/undersizing the infrastructure to offer services when highly 
variable demand exists. 


iii. They make the cost of acquiring recent images of the Earth very high. 
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iv. The customers cannot access directly neither fast to the information they need because 
this has to be processed and ad-hoc distributed. 


However, the use of cloud computing technology can eliminate the previous drawbacks to 
improve EO services because it is elastic, scalable, it works on demand through virtualization 
of resources, offers virtually unlimited storage and computation capability, it is worldwide 
connected and it is based on a pay per use model [7, 8]. 


Nevertheless, the current cloud computing technology still presents some limitations: 


i. The virtual machine images (VMIs) are not optimized, being highly oversized, impacting 
in the costs of using the infrastructure and in the dynamic resources provisioning. 


ii. The deployment of virtual machines (VM) in cloud is not in real time. The deployment 
normally takes between 10 and 20 minutes, which directly affects to the flexibility and 
dynamic scalability of the system. 


iii. Although the pay per use model should intrinsically have reduced costs, since the cus- 
tomer only pays for what he uses, the costs of using cloud computing are still high. 


iv. There are some major worldwide champions in the offer of cloud services such as Amazon, 
Google, Microsoft and IBM, which make difficult the migration of a system from a cloud in- 
frastructure to another different cloud infrastructure, existing vendor lock-in. This limits the 
democratization of these services and makes an entrance barrier for new cloud providers. 


Within the ENTICE H2020 project (project no. 644179), we intend to demonstrate that pro- 
cessing the data recorded from Earth observations in a cloud environment with the middle- 
ware ENTICE optimizes the efficiency and overcomes the critical barriers of cloud computing 
and data processing needs. Among other advantages, ENTICE provides independence 
from a specific infrastructure provider and facilitates the distribution of VMs in distributed 
infrastructures. 


In this work, we present the implementation of the Earth Observation Data (EOD) pilot, which 
mainly consists of the implementation in cloud of the already commercial Ground Segment 
for Earth Observation (gs4EO) suit, commercialized by Deimos [9], which is currently opera- 
tional in the Deimos-2 satellite mission [10]. 


For this purpose, we simulate a real scenario with the Deimos-2 satellite running in a feder- 
ated cloud infrastructure, in which we obtain real performance metrics and present real sys- 
tem requirements for normal operations with the satellite. Through this experimentation, we 
demonstrate the EOD concept as a solution for the new EO market paradigm. 


2. Earth Observation Data Processing and Distribution Pilot 


2.1. ENTICE environment 


In order to facilitate the implementation in cloud, the EOD pilot makes use of the ENTICE mid- 
dleware [11], which facilitates autoscaling and flexibility to the ingestion of satellite imagery, its 
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processing and distribution to end users with variable demands. Kecskemeti et al. [12] intro- 
duced the ENTICE approach to solve these problems. The ENTICE environment consists of a 
ubiquitous repository-based technology, which provides optimised virtual machine (VM) image 
creation, assembly, migration and storage for federated clouds. The webpage of ENTICE can be 
found in [13]. 


ENTICE facilitates the implementation of cloud applications by simplifying the creation of 
lightweight virtual machine images (VMIs) by means of functional descriptors. These func- 
tional descriptors define at high and functional levels the VMIs and contribute to define the 
system Service Level Agreement (SLA) to facilitate the optimization of the VMIs in terms of 
performance, costs, size and quality of service (QoS) needed. Then, the VMIs are automati- 
cally decomposed and distributed to meet the application runtime requirements. In addition, 
ENTICE facilitates elastic autoscaling. The benefits of using ENTICE are the following: 


e Reduction of up to 80% storage. 

e 95% elastic Quality of Service. 

e VMIs creation 25% faster. 

e Reduction on the costs of deployment. 

e VMIs optimization up to 60%. 

e VMIs delivery 30% faster. 

e Scalability and elasticity. 

e Elimination of cloud infrastructure vendor lock-in. 


In the EOD pilot, ENTICE is used as middleware between the federated infrastructure 
described in Section 3.1 and the gs4EO application software. 


2.2. EOD pilot description 


The Earth Observation Data Processing and Distribution Pilot (EOD) consists of the implementa- 
tion of the Elecnor Deimos’ geo-data processing, storage and distribution platform of Deimos-2 
satellite using cloud technologies. The main functionalities of the system are the following: 


e Acquisition of raw data: When the imagery data are ingested from the satellite into the 
ground station, the system is notified and the ingestion component automatically ingests 
the raw data into the cloud for its processing. 


e Processing of data: Once the data are ingested, it is processed in the product processors. 
There are several processing levels to provide different products. 


e Archiving and cataloguing geo-images: The different products obtained from the process- 
ing of raw data are archived and catalogued in order to provide these images or high added 
value services to end users. 


e Offering user services: This is the front-end of the system. It allows end users to select the 
product that they want to visualize or to download. 


Optimization of an Earth Observation Data Processing and Distribution System 
http://dx.doi.org/10.5772/intechopen.7 1423 


Ny 


Raw Data 


Input Data 


.> 


archive4EO 


Shared Storage 


Requests = Images 


process4EOnode 


Requests Images 


«===> Work Flow 


—> Data Flow 


Figure 1. Earth Observation Data Processing and Distribution pilot (EOD)’s architecture. 


2.2.1. EOD architecture 


The main objectives of the EOD pilot is to process real data of Deimos-2 satellite in a realistic 
scenario of normal operation and the validation of the processing chain module as part of the 
cloud infrastructure. Ramos and Becedas [14] proposed an original architecture of the gs4EO 
suit to be implemented in cloud. Based on that work, the architecture for the EOD pilot has 
been redesigned and implemented, see Figure 1. 


The architecture is composed of the following components: 


e monitor4EO: It is a ground station monitor, which ingests the available raw data from the 
ground stations to the cloud system. It contains an Orchestrator, which manages the tasks 
of the different modules. 


e process4EO server: It is the Orchestrator, which is the component that manages the tasks 
to be done by all the modules of the architecture computed in the cloud infrastructure. The 
Orchestrator has the following functions: 


o To identify which outputs shall be generated by the processors. 


o To generate the Job Orders. They contain all the necessary information that the processors 
need. Furthermore, these eXtensive Markup Language (XML) files include the interfaces 


179 


180 Multi-purposeful Application of Geospatial Data 


and addresses of the folders in which the input information to the processors is located 
and the folders in which the outputs of the processors have to be sent. They also include 
the format in which the processors generate their output. 


o To find data in the ground stations (pooling) to be ingested in a shared storage unit in 
the cloud for its distribution to the processing chain. 


o To control the processing chain by communicating with the product processors. 
o To manage the archive and catalogue. 


e process4EO node: Constituted of different software modules, which are in charge of the 
processing of the raw data and the products of previous levels to produce image products. 
Figure 2 depicts the pipeline of the image processing process. The four most important 
operations are the following: 


o Calibration: (LO and LOR processing levels) to convert the pixel elements from instru- 
ment digital counts into radiance units. 


o Geometric correction: (L1A processing level) to eliminate distortions due to misalign- 
ments of the sensors in the focal plane geometry. 


o Geolocation: (LIBR processing level) to compute the geodetic coordinates of the input 
pixels. 


o Orthorectification: (LIC processing level) to produce orthophotos with vertical projec- 
tion, free of distortions. 


x 


> 


Processing LO Processing LOR Processing L1A Processing L1BR Processing LICR 


Archiving 


Cataloguing 


8 


Figure 2. EOD’s pipeline. 
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e archive4EO: In this module, the processed images are stored and catalogued for their dis- 
tribution. It offers a Catalogue Service for the Web (CSW) interface. 


e user4EO: It is a web service in which the end users can access to the products. 


e Shared storage: It is a storage module shared by all the modules of the architecture in 
which all the inputs and outputs of the different modules of the architecture are stored. 


3. Experiment setup 


3.1. Testing infrastructure 


The testing infrastructure used in the experiment is formed by hardware deployed in three 
different locations and managed in a federated manner: DMU infrastructure (in Deimos UK 
in United Kingdom), DMS infrastructure (in Deimos Space in Spain) and DME infrastruc- 
ture (in Deimos Engenharia in Portugal). The hardware resources deployed in every location 
are described in Table 1. The ENTICE middleware was installed in the DMU infrastructure, 
which is acting as master. It also contains an object store with interface to Amazon Simple 
Storage Service (Amazon S3) for cloud bursting. DMS and DME infrastructures are slaves 
of DMU infrastructure and contain object stores also with interfaces to Amazon S3. A block 
diagram describing the interrelations of the testing infrastructure is depicted in Figure 3. 
The virtualization of the infrastructure was done with OpenNebula. Kernel-based Virtual 
Machine (KVM) was used as hypervisor. The creation of the virtual machines was done with 
Packer, whereas the automatic deployment of the virtual machines was done with Ansible. 
Figure 4 shows a diagram describing the logic process of automatic generation of the virtual 
machines that constitute the EOD software. The image building process takes advantage of 


Location Name Model CPU RAM (GB) HD (GB) OS 
DMU Node-1 Dell Optiplex790 Intel Core i7-2600 8 160 CentOS 
3.4 GHz 7.2.1511 
Node-2 Dell Optiplex790 Intel Core i7-2600 16 250 CentOS 
3.4 GHz 7.2.1511 
OpenNebula-fe Dell Optiplex745 Intel Core 2 6300 4 250 CentOS 
1.86 GHz 7.2.1511 
DMS Node-2 Dell Intel 8 Core 16 2048 CentOS 
2.37 GHz 7.2.1511 
Nodel Dell Intel 2 Core 3 GHz 6 230 CentOS 
7.2.1511 
DME Nodel HP AMD Athlon 64 X2 4 256 CentOS 
Dual Core 3800+ 7.2.1511 


Table 1. Hardware resources in the testing infrastructure. 
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Figure 3. Block diagram of the testing infrastructure. 


the functionalities provided by Packer and Ansible to build KVM images. The virtual images 
are based on CentOS 6 Linux distribution and are stored in qcow2 format. This automation 
step comprises several files: 


e Execution script: This script, developed in Python, launches the creation of the machine 
image with Packer. It receives a JSON file with all the variables that will be used in the 
building process, e.g. the user configuration, software repositories, Kickstart file and An- 
sible playbook, and configures all the required fields in the Kickstart file. It can build all 
the types of VMIs required to deploy the EOD software: archive4EO, monitor4EO and 
process4EO. The type of virtual machine to generate is specified in the content of the con- 
figuration file. 
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e Packer template: It is a JSON file that provides all the information to create the virtual ma- 
chine in Packer. It contains the format, the instructions and the parameters on how to build 
a VMI using KVM. The provisioners define the scripts or recipes in Ansible for configuring 
the machine and installing the applications. 


e Ansible playbook: These files are “recipes” to install the EOD software in the virtual 
machines. This is a YAML file with the commands expressed in a simplified language, 
describing a configuration or a process. It contains the information to configure the sys- 
tem, install the EOD software and the functionalities to work in the cloud environment 
(contextualization). 
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Figure 4. Diagram of the automatic generation of the EOD virtual machines. 
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The Python script receives the configuration file and launches the Packer command after con- 
figuring some parameters in the Kickstart file. The Packer command takes the template and 
runs all the builds within it in order to generate a set of artefacts and build the image in 
KVM. Once the image is built, Packer launches all the provisioners (Ansible) contained in the 
template. Ansible carries out several steps: it configures all the repositories, installs all the 
dependencies and software packages of the EOD modules, configures the EOD software and 
installs a context package to deploy the VMI in OpenNebula. 


The recording of the experiment data was done with Jmeter™ [15] and Nagios® [16]. Jmeter™ 
is installed in the Node and Nagios® in a virtual machine inside the federated cloud. It is used 
for the monitoring of the cloud resources and status and to extract the experimental data. 


3.2. Experiment description 


The aim of this experiment is to demonstrate the feasibility of implementing the EOD system 
in cloud and how its behavior improves after the optimization done by ENTICE over the 
process4EO node. 


The experiment is that of a realistic recording with Deimos-2 satellite in which a real acquisi- 
tion is ingested into the EOD pilot. Then, the processing of the raw data is carried out with the 
EOD pilot before and after the optimization process. The results are compared to evaluate the 
functionality of the optimized system with regard to the nonoptimized system and validate 
the implementation of the gs4EO modules in cloud. 


VMI size, VMI creation time, VMI delivery time and VMI deployment time are the evaluated 
metrics selected to compare the performance of the system before and after the optimization 
process. 


The following are the evaluated metrics to demonstrate that the functionality of the system 
remains the same after the optimization: processing time, imagery products size, CPU use per 
process and memory use per process. 


The raw data used in the experiment have 3 MB size, four multispectral bands (R, G, B and NIR) 
and one panchromatic. The recorded area of the land surface is a rectangle of 8.86 x 16.59 km’. 


The raw data are managed and processed to automatically obtain the following products: 
e LO: raw data decoded. 

e LOR: transformation of LO into image. 

e LIA: geolocated and radiometric calibrated image. 

e LIBR: resampled image and more precise geolocation. 

e LICR: orthorectification. 


The virtual resources used in the experiment were the following: a virtual machine with 
300 GB, a RAM of 10 GB, four CPUs of 32 bits, a shared storage with 99 GB and an additional 
storage volume with 50 GB. This hardware was used for both experiments (EOD before and 
after optimization) in order to facilitate comparison. 
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4, Experiment results 


First, the virtual machine images of the EOD pilot were created, delivered and deployed in 
the cloud. Then, the virtual machine of the proces4EO was optimized and its VMI was again 
created, delivered and deployed. The time spent in every step is depicted in Table 2. 


In these results, one can see the increase in the performance of the system before the runtime, 
i.e. up to the deployment of the system: this is a reduction of 30% in VMI size, a reduction of 
37.3% in the VMI creation time, a reduction of 34.53% in the VMI delivery time and a reduc- 
tion of 54.05% in the deployment time. 


Next, the raw data recorded with the satellite were ingested in both the original EOD pilot and 
the optimized EOD pilot. The response of both optimized and nonoptimized systems were 
measured in the runtime. The processing time of the satellite imagery in the original EOD pilot 
and the EOD pilot with the optimization of the processing chain is shown in Figures 5 and 6 
respectively. It can be noticed that the processing time of the different levels is similar in both 
experiments, so as to the time to process the raw data up to the orthorectification level (LICR): 
33.95 and 35.75 s in the nonoptimized and optimized systems, respectively. This difference is 
not substantial and can be produced by some OpenNebula processes, or the cloud has used 


VMI size (GB) VMI creation time VMI delivery time VMI deployment time 


(hh:mm:ss) (hh:mm:ss) (hh:mm:ss) 
Nonoptimized VM 2 00:19:42 00:20:25 0:06:47 
Optimized VM 1.4 00:12:21 00:13:22 0:03:07 
Reduction (%) 30 37.31 34.53 54.05 


Table 2. Metrics of the optimized and nonoptimized EOD pilot. 


40 
_ 35 
i's) 
a 
5 30 
£ 
£ 25 
= 
& 20 
69 
£ 15 
N 
N 
3 10 
= 5 


E Per stage 


Figure 5. Processing time of the satellite imagery with nonoptimized EOD system. 
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Figure 6. Processing time of the satellite imagery with optimized EOD system. 


Data type Raw data LO LOR L1A L1BR LICR Total 
Products 
Size of the products obtained with 3090 764 789 749 1140 1130 4572 


the non-optimized system (MB) 


Size of the products obtained with 3090 764 789 749 1140 1130 4572 
the optimized system (MB) 


Table 3. Imagery product sizes obtained with both the nonoptimized and the optimized EOD system. 
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Figure 7. CPU use per process in the nonoptimized EOD system. 
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Figure 8. CPU use per process in the optimized EOD system. 


some resources while executing the experiments. In addition, the size of the different imag- 
ery products in both experiments is depicted in Table 3. Notice that the size of the different 
products remains the same in both experiments. These demonstrate that the functionality of 


the system is intact after the optimization process, while the optimization provides benefits in 
storage, creation, delivery and deployment of the system. 


Furthermore, the CPU and memory used in both experiments are similar for all the process- 


ing stages: in Figure 7, the CPU used in the processing of the satellite imagery with the non- 
optimized system is shown; in Figure 8, the CPU used in the optimized system is depicted. 
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Figure 9. Memory use per process in the nonoptimized EOD system. 
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Besides, the memory used by the optimized system was lower: the memory use per process 
in the nonoptimized system can be seen in Figure 9, while the memory used in the optimized 
system can be seen in Figure 10. 


These results obtained with the EOD pilot can be related with the new paradigms of the Earth 
Observation market stated in [1]. Table 4 describes how an approach of a PDGS system simi- 
lar to the EOD pilot could cover the main requirements of the new EO market. 
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Figure 10. Memory use per process in the optimized EOD system. 


New paradigm requirements EOD pilot 


Costs optimization Cost reduction by means of reduced storage of optimized VMIs, reduced creation 
time, reduced delivery time and reduced deployment time 


Multi sensors ground Ground stations, ground control centers and data processing centers would take 

processing systems advantage of a rapid, agile, resilient and secure interconnected computer system in cloud 

Vertical integration Global distributed infrastructure connecting all the stakeholders in an operational 
environment 

Scalability Elastically autoscale applications on cloud resources based on their fluctuating 


load with optimized VM interoperability across cloud infrastructures and without 
provider lock-in 


Table 4. New paradigm requirements vs. EOD pilot approach. 


5. Conclusions and future work 


In this work, the successful implementation of the EOD pilot in an experimental cloud 
infrastructure with the middleware ENTICE was demonstrated. The pilot was tested and 


Optimization of an Earth Observation Data Processing and Distribution System 
http://dx.doi.org/10.5772/intechopen.7 1423 


promising results were obtained. These results indicated that real scenarios of satellite 
imagery managing and processing can be carried out in cloud with many advantages with 
respect to traditional infrastructures. Furthermore, an optimization of the EOD pilot was 
carried out, demonstrating a reduction of 30% in VMI size, 37.3% in the VMI creation time, 
34.53% in the VMI delivery time and 54.05% in the deployment time, while maintaining 
the functionality of the system intact. This indicates that a PDGS system implemented in 
cloud in a similar manner to that of the EOD pilot can fulfill the requirements of the new 
Earth observation market paradigm. Specifically, these EOD pilot results demonstrate that 
the deployment of an optimized PDGS system in cloud can reduce the costs of storage and 
reduce the time to user by reducing the creation time, the delivery time and the deploy- 
ment time of the system. Besides, ground stations can take the advantage of rapid, agile, 
resilient and secure interconnected system when are cloud-based. In addition, the global 
operational environment provided by a cloud infrastructure facilitates both global acquisi- 
tion and distribution of data, improving the market efficiency. Finally, the system improves 
its scalability without vendor lock-in, covering the needs of recent on demand markets. 


In future research, different realistic scenarios with variable demand of services will be tested. 
With these scenarios, we will evaluate the elastic behaviour in the ingestion of raw data in the 
system, the processing and the distribution of imagery products to users. Furthermore, a com- 
plete optimization of the system will be tested to evaluate the complete repository storage size 
reduction, which was not evaluated in this work. In addition, new metrics will be measured to 
validate the implementation of the system for its commercial implementation in the next future. 
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